智能论文笔记

Discovering ordinary differential equations that govern time-series

Sören Becker , Michal Klein , Alexander Neitz , Giambattista Parascandolo , Niki Kilbertus

分类：机器学习

2022-11-05

Natural laws are often described through differential equations yet finding a differential equation that describes the governing law underlying observed data is a challenging and still mostly manual task. In this paper we make a step towards the automation of this process: we propose a transformer-based sequence-to-sequence model that recovers scalar autonomous ordinary differential equations (ODEs) in symbolic form from time-series data of a single observed solution of the ODE. Our method is efficiently scalable: after one-time pretraining on a large set of ODEs, we can infer the governing laws of a new observed solution in a few forward passes of the model. Then we show that our model performs better or on par with existing methods in various test cases in terms of accurate symbolic recovery of the ODE, especially for more complex expressions.

translated by 谷歌翻译

Supervised Contrastive Learning to Classify Paranasal Anomalies in the Maxillary Sinus

Debayan Bhattacharya , Benjamin Tobias Becker , Finn Behrendt , Marcel Bengs , Dirk Beyersdorff , Dennis Eggert , Elina Petersen , Florian Jansen , Marvin Petersen , Bastian Cheng

分类：计算机视觉 | 机器学习

2022-09-05

使用深度学习技术，可以在MRI图像中自动检测到旁那鼻鼻窦系统中的异常，并可以根据其体积，形状和其他参数（例如局部对比度）进行进一步分析和分类。但是，由于培训数据有限，传统的监督学习方法通常无法概括。现有的旁那间异常分类中的深度学习方法最多可诊断出一种异常。在我们的工作中，我们考虑三个异常。具体而言，我们采用3D CNN来分离上颌鼻窦体积，而没有异常的鼻窦体积，并具有异常。为了从一个小标记的数据集中学习强大的表示形式，我们提出了一种新颖的学习范式，结合了对比损失和跨内向损失。特别是，我们使用有监督的对比损失，鼓励有或没有异常的上颌窦量的嵌入来形成两个不同的簇，而跨层损失则鼓励3D CNN保持其歧视能力。我们报告说，两种损失的优化是有利的，而不是仅通过一次损失而优化。我们还发现我们的培训策略会提高标签效率。使用我们的方法，3D CNN分类器的AUROC为0.85，而用横向渗透损失优化的3D CNN分类器可实现0.66的AUROC。

translated by 谷歌翻译

Evaluating Machine Unlearning via Epistemic Uncertainty

Alexander Becker , Thomas Liebig

分类：机器学习

2022-08-23

最近，人们对机器的兴趣越来越大，这主要是由于法律要求，例如《通用数据保护法规》（GDPR）和《加利福尼亚州消费者隐私法》。因此，提出了多种方法，以从训练有素的模型中消除特定目标数据点的影响。但是，在评估学习的成功时，当前方法要么使用对抗攻击，要么将其结果与最佳解决方案进行比较，该解决方案通常从头开始纳入重新培训。我们认为两种方式在实践中都不足。在这项工作中，我们提出了基于认知不确定性的机器学习算法的评估度量。这是对我们最佳知识的机器学习通用评估指标的第一个定义。

translated by 谷歌翻译

"GAN I hire you?" -- A System for Personalized Virtual Job Interview Training

Alexander Heimerl , Silvan Mertes , Tanja Schneeberger , Tobias Baur , Ailin Liu , Linda Becker , Nicolas Rohleder , Patrick Gebhard , Elisabeth André

分类：机器学习

2022-06-08

求职面试通常是高风险的社交场所，需要专业和行为技巧才能令人满意。专业的工作面试培训师会根据公共标准提供有关显示行为的教育反馈。对于提高工作面试所需的行为技能，这种反馈可能会有所帮助。产生此类反馈的技术方法可能是工作面试培训的嬉戏且低调的起点。因此，我们通过基于生成的对抗网络（GAN）的方法扩展了交互式虚拟工作面试培训系统，该方法首先检测到行为弱点并随后产生个性化的反馈。为了评估生成的反馈的有用性，我们使用求职培训系统的模型进行了一项混合方法试点研究。总体研究结果表明，基于GAN的产生的行为反馈很有帮助。此外，参与者评估反馈将改善他们的工作面试绩效。

translated by 谷歌翻译

FiLM-Ensemble: Probabilistic Deep Learning via Feature-wise Linear Modulation

Mehmet Ozgur Turkoglu , Alexander Becker , Hüseyin Anil Gündüz , Mina Rezaei , Bernd Bischl , Rodrigo Caye Daudt , Stefano D'Aronco , Jan Dirk Wegner , Konrad Schindler

分类：机器学习

2022-05-31

The ability to estimate epistemic uncertainty is often crucial when deploying machine learning in the real world, but modern methods often produce overconfident, uncalibrated uncertainty predictions. A common approach to quantify epistemic uncertainty, usable across a wide class of prediction models, is to train a model ensemble. In a naive implementation, the ensemble approach has high computational cost and high memory demand. This challenges in particular modern deep learning, where even a single deep network is already demanding in terms of compute and memory, and has given rise to a number of attempts to emulate the model ensemble without actually instantiating separate ensemble members. We introduce FiLM-Ensemble, a deep, implicit ensemble method based on the concept of Feature-wise Linear Modulation (FiLM). That technique was originally developed for multi-task learning, with the aim of decoupling different tasks. We show that the idea can be extended to uncertainty quantification: by modulating the network activations of a single deep network with FiLM, one obtains a model ensemble with high diversity, and consequently well-calibrated estimates of epistemic uncertainty, with low computational overhead in comparison. Empirically, FiLM-Ensemble outperforms other implicit ensemble methods, and it and comes very close to the upper bound of an explicit ensemble of networks (sometimes even beating it), at a fraction of the memory cost.

translated by 谷歌翻译

Country-wide Retrieval of Forest Structure From Optical and SAR Satellite Imagery With Bayesian Deep Learning

Alexander Becker , Stefania Russo , Stefano Puliti , Nico Lang , Konrad Schindler , Jan Dirk Wegner

分类：计算机视觉 | 机器学习

2021-11-25

以知情方式监测和管理地球林是解决生物多样性损失和气候变化等挑战的重要要求。虽然森林评估的传统或空中运动提供了在区域一级分析的准确数据，但将其扩展到整个国家，以外的高度分辨率几乎不可能。在这项工作中，我们提出了一种贝叶斯深度学习方法，以10米的分辨率为全国范围的森林结构变量，使用自由可用的卫星图像作为输入。我们的方法将Sentinel-2光学图像和Sentinel-1合成孔径雷达图像共同变换为五种不同的森林结构变量的地图：95th高度百分位，平均高度，密度，基尼系数和分数盖。我们从挪威的41个机载激光扫描任务中培训和测试我们的模型，并证明它能够概括取消测试区域，从而达到11％和15％之间的归一化平均值误差，具体取决于变量。我们的工作也是第一个提出贝叶斯深度学习方法的工作，以预测具有良好校准的不确定性估计的森林结构变量。这些提高了模型的可信度及其适用于需要可靠的信心估计的下游任务，例如知情决策。我们提出了一组广泛的实验，以验证预测地图的准确性以及预测的不确定性的质量。为了展示可扩展性，我们为五个森林结构变量提供挪威地图。

translated by 谷歌翻译

Benchmarking common uncertainty estimation methods with histopathological images under domain shift and label noise

Hendrik A. Mehrtens , Alexander Kurz , Tabea-Clara Bucher , Titus J. Brinker

分类：计算机视觉 | 机器学习

2023-01-03

In the past years, deep learning has seen an increase of usage in the domain of histopathological applications. However, while these approaches have shown great potential, in high-risk environments deep learning models need to be able to judge their own uncertainty and be able to reject inputs when there is a significant chance of misclassification. In this work, we conduct a rigorous evaluation of the most commonly used uncertainty and robustness methods for the classification of Whole-Slide-Images under domain shift using the H\&E stained Camelyon17 breast cancer dataset. Although it is known that histopathological data can be subject to strong domain shift and label noise, to our knowledge this is the first work that compares the most common methods for uncertainty estimation under these aspects. In our experiments, we compare Stochastic Variational Inference, Monte-Carlo Dropout, Deep Ensembles, Test-Time Data Augmentation as well as combinations thereof. We observe that ensembles of methods generally lead to higher accuracies and better calibration and that Test-Time Data Augmentation can be a promising alternative when choosing an appropriate set of augmentations. Across methods, a rejection of the most uncertain tiles leads to a significant increase in classification accuracy on both in-distribution as well as out-of-distribution data. Furthermore, we conduct experiments comparing these methods under varying conditions of label noise. We observe that the border regions of the Camelyon17 dataset are subject to label noise and evaluate the robustness of the included methods against different noise levels. Lastly, we publish our code framework to facilitate further research on uncertainty estimation on histopathological data.

translated by 谷歌翻译

Computational Charisma -- A Brick by Brick Blueprint for Building Charismatic Artificial Intelligence

Björn W. Schuller , Shahin Amiriparian , Anton Batliner , Alexander Gebhard , Maurice Gerzcuk , Vincent Karas , Alexander Kathan , Lennart Seizer , Johanna Löchner

分类：人工智能 | 计算机视觉 | 机器学习

2022-12-31

Charisma is considered as one's ability to attract and potentially also influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. A number of models exist that base charisma on various dimensions, often following the idea that charisma is given if someone could and would help others. Examples include influence (could help) and affability (would help) in scientific studies or power (could help), presence, and warmth (both would help) as a popular concept. Modelling high levels in these dimensions for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we, thereforem present a blueprint for building machines that can appear charismatic, but also analyse the charisma of others. To this end, we first provide the psychological perspective including different models of charisma and behavioural cues of it. We then switch to conversational charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behaviour by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then name exemplary use cases of computational charismatic skills before switching to ethical aspects and concluding this overview and perspective on building charisma-enabled AI.

translated by 谷歌翻译

Learning 3D Human Pose Estimation from Dozens of Datasets using a Geometry-Aware Autoencoder to Bridge Between Skeleton Formats

István Sárándi , Alexander Hermans , Bastian Leibe

分类：计算机视觉

2022-12-29

Deep learning-based 3D human pose estimation performs best when trained on large amounts of labeled data, making combined learning from many datasets an important research direction. One obstacle to this endeavor are the different skeleton formats provided by different datasets, i.e., they do not label the same set of anatomical landmarks. There is little prior research on how to best supervise one model with such discrepant labels. We show that simply using separate output heads for different skeletons results in inconsistent depth estimates and insufficient information sharing across skeletons. As a remedy, we propose a novel affine-combining autoencoder (ACAE) method to perform dimensionality reduction on the number of landmarks. The discovered latent 3D points capture the redundancy among skeletons, enabling enhanced information sharing when used for consistency regularization. Our approach scales to an extreme multi-dataset regime, where we use 28 3D human pose datasets to supervise one model, which outperforms prior work on a range of benchmarks, including the challenging 3D Poses in the Wild (3DPW) dataset. Our code and models are available for research purposes.

translated by 谷歌翻译

Bayesian Interpolation with Deep Linear Networks

Boris Hanin , Alexander Zlokapa

分类： (统计)机器学习 | 机器学习

2022-12-29

This article concerns Bayesian inference using deep linear networks with output dimension one. In the interpolating (zero noise) regime we show that with Gaussian weight priors and MSE negative log-likelihood loss both the predictive posterior and the Bayesian model evidence can be written in closed form in terms of a class of meromorphic special functions called Meijer-G functions. These results are non-asymptotic and hold for any training dataset, network depth, and hidden layer widths, giving exact solutions to Bayesian interpolation using a deep Gaussian process with a Euclidean covariance at each layer. Through novel asymptotic expansions of Meijer-G functions, a rich new picture of the role of depth emerges. Specifically, we find that the posteriors in deep linear networks with data-independent priors are the same as in shallow networks with evidence maximizing data-dependent priors. In this sense, deep linear networks make provably optimal predictions. We also prove that, starting from data-agnostic priors, Bayesian model evidence in wide networks is only maximized at infinite depth. This gives a principled reason to prefer deeper networks (at least in the linear case). Finally, our results show that with data-agnostic priors a novel notion of effective depth given by \[\#\text{hidden layers}\times\frac{\#\text{training data}}{\text{network width}}\] determines the Bayesian posterior in wide linear networks, giving rigorous new scaling laws for generalization error.

translated by 谷歌翻译